Detecting relic gravitational waves in the CMB: A statistical bias 
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Analyzing the imprint of relic gravitational waves (RGWs) on the cosmic microwave background 
(CMB) power spectra provides a way to determine the signal of RGWs. In this Letter, we discuss a 
statistical bias, which could exist in the data analysis and has the tendency to overlook the RGWs. 
We also explain why this bias exists, and how to avoid it. 
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I. INTRODUCTION 

A stochastic background of relic gravitational waves 
was produced in the very early stage of the Universe due 
to the superadiabatic amplification of zero point quan- 
tum fluctuations of the gravitational field [l|, 121 • The 
relic gravitational waves have a wide range of spreading 
of the spectra, and their detection provides a direct way 
to study the physics in the early Universe. 

Recently, there have been several experimental efforts 
to constrain the amplitude of relic gravitational waves 
in different frequencies. Among various direct obser- 
vations, LIGO S5 has experimentally obtained so far 
the most stringent bound rigw(/) < 6.9 x 10~® around 
/ ^ lOOHz [3|, which will be much improved by fu- 
ture observations, including the third-generation Ein- 
stein Telescope Q. The timing studies on the millisec- 
ond pulsars by the PPTA and EPTA teams also re- 
ported upper limits rjgw(/) < 10"^ at / - 1/yr [I, [1]. 
In addition, there are two bounds on the integration 
jQg^{f)d\iif < 1.5 X 10~^obtained by the big bang 
nucleosynthesis observation [7| and the cosmic microwave 
background radiation observation Q ■ 

In this paper, we shall focus on the detection of relic 
gravitational waves by the cosmic microwave background 
(CMB) radiation observations. The RGWs leave well 
understood imprints on the anisotropics in temperature 
and polarization of CMB [3, |l3| • The theoretical analysis 
of these imprints along with the data (including T, C, 
E, B) from CMB experiments allows one to determine 
the RGW background by constraining the parameters: 
the tensor-to-scalar ratio r and the spectral index nt- 
The current observations of CMB by WMAP satellite 
place an interesting bound r < 0.20 [11| by assuming 
nt — —r/8, which has been generalized in |12| . These 
bounds are equivalent to the constraints on the energy 
density ^gw{f) of relic gravitational waves at the lowest 
frequency range / ^ lO^^^Hz. 

Detecting the relic gravitational waves remains one of 
the most important tasks for the upcoming CMB obser- 
vations (see [1^ for reviews). Due to the various large 
contaminations, in the near future, we can only expect 
to detect a signal of RGWs in a relative low signal-to- 



noise ratio {S/N). This result would guide the far future 
detections. 

As for the whole data analysis, we expect that, the 
maximum value of the parameters in the posterior pos- 
sibility density function (pdf) is unbiased for the 'true' 
values of the parameters, which is auto-satisfied when 
the S/N is high. However when S/N is low, the maxi- 
mum values of the parameters sometimes lead to a biased 
guide for the 'true' values, which can be generated either 
by some systematics or by the statistics, and should be 
avoided in any data analysis. 

In this Letter, we will point out that, a statistical bias 
could exist in the CMB data analysis for the detection 
of RGWs. We also explain why the bias does exist, and 
suggest the way to avoid it. 



II. THE STATISTICAL BIAS 

The primordial power spectrum of relic gravitational 
waves can be simply described by the following power-law 
formula: 



Pt{k)^At{ko)(k/koy 



(1) 
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where fcg is the pivot wavenumber, which can be arbi- 
trarily chosen. At{ko) is the amplitude of RGWs, and nt 
is the spectral index. The value of nt is quite close to 
zero, predicted by the physical models of the early Uni- 
verse. As usual, we can define the tensor-to-scalar ratio 
r = At{ko)/As{ko), where As{ko) is the amplitude of 
the density perturbations. Obviously, assuming As{ko) 
is known as in this Letter, r is just At(fco) normalized by 
the constant As{ko). 

In order to discuss the statistical bias for the detection 
of RGWs in the data analysis, let us simulate the ob- 
servable data for the Planck satellite, where we only con- 
sider the Planck instrumental noises at the 143GHz fre- 
quency channels [14]. We adopt the 'input' cosmological 
models as fife/i^ = 0.02267, Qch^ = 0.1131, Qa = 0.726, 
Treion = 0.084, h = 0.705, As = 2.445 x 10"^ and n^ = 1- 
The RGWs parameters are adopted as r = f = 0.05, 
nt = fit = 0. As we have discussed in the previous paper 
[15j , this small r is expected to be detected at 2cr for the 
assumed noise level. 

Based on this input cosmological model, and the as- 
sumed noise level, we simulate 500 data samples. For 




FIG. 1: The distribution of tml, ntuL and zml for the 500 
simulated samples. The blue shadow shows the results by 
adopting the free parameters r and rit and the flat prior of 
them. The red column shows the results by adopting the free 
parameters r and z and the flat prior of them. 



III. UNDERSTANDING THE STATISTICAL 
BIAS 

It is important to understand why this statistical bias 
does exist. In order to realize it, let us proceed the follow- 
ing analytical approximation for the likelihood analysis. 

The primordial power spectrum of RGW in ([T]) can be 
rewritten as, 

Pt{k) - At{ko) {k/kop = ^(fco)r-exp [n, In (fc/fco)] , (2) 

which can be approximated as 

Ptik)^As{ko)[r + rntlnik/ko)]. (3) 

In this approximation, we have used \nt\ <C 1. 

The total CMB power spectra Cf (V = T, C, E, B) 
include the contributions of density perturbations and 
gravitational waves, i.e. 



^l — ^i,s + ^l,tJ 



(4) 



every sample, we can probe the likelihood function by ap- 
plying the Markov Chain Monte Carlo (MCMC) method. 
In the data analysis, we assume all the parameters, ex- 
cept for r and nj, are all fixed as their input values. 
For the parameter rit, one always presumes the relation 
nt — rig — I or nt — — r/8 in the data analysis |16j[l7|. 
However, this assumption does depend on the special cos- 
mological models. If they are not the truth, but pre- 
sumed, the finial conclusion of the data analysis would 
deviate from the real physics. 



In order to avoid this danger, the natural way is setting 
r and nt as free parameters. We choose the flat priors of 
them in the range r G [0, 1] and nt £ [—3,3]. We adopt 
the best-pivot wavenumber, which is fco = O.OOOGMpc""'^ 
for the input model and the assumed noise level 



18| 



The most interesting final result is the maximum value 
in the 1-dimensional posterior pdf for the parameters r 
and nt- In this paper, we denote them by tml and njML- 
Of course, their values do depend on the simulated data. 
For different data samples, they have different values. 
We expect the distribution of these 500 tml and ritML 
are around their input values. However, it may be not 
the truth in the real analysis. In Figll] we plot the distri- 
bution of tml and ntuL with blue shadows. This figure 
shows that, the distribution of ntuh is peaked at zero, 
the input value. However, the distribution of tml obvi- 
ously approaches to r = 0, and biased the input value 
at r = 0.05. This suggests that, if we deal with the 
data analysis in this way, the resulting conclusion has 
the tendency to deviate from the 'true' value of r, and to 
overlook the RGWs. 



where Cf^ and Cf^. are the contributions of density per- 
turbations and gravitational waves, separately. Note that 
Cf^ = 0. By considering \nt\ <C 1, the spectra Cj^, as a 
function of r and nt, can be approximated as [l8l| 



Here CY, 






CJ' 






r + rnt In (l/lo)] 



(5) 



, ^ — ^ ^ ^(r = 1, nt — 0) , and best-pivot multipole 
to = A:o X lO^M^pc [3. So Pt{k) and C^t are all the linear 
combinations of the parameters r and mt . 

Now, let us turn to the likelihood function. The exact 
form can be found in the previous works [131 [la, [la, [l3 ■ 
In the analytical approximation, it can be well approxi- 
mated by [18| 



21n/: = EE 



dY - c} 



'^DY 



(6) 



DJ is the observable data, and CTj 
of DY . The likelihood function 

m 



is standard deviation 
I can be rewritten as 



^2lnC = J2J2 K - (^ + rntbi)aj] ' 
e Y 

where we have defined the quantities 

bj = ln(^/4) 



r)Y _f-<Y 

Uf = : • 



Y _ 






(7) 



(8) 



'D] 



which are all independent of the variables r and n^. 
Obviously, the value of dj depends on the data. For 
a larger number of different sample, the average value 
of dj is {dj) = aj {f + fhtbi), due to the facts of 



{DY} = Cl+Clir 
fit) ~ Cj{f + fhtbe). 



f,nt — fit) and Ci^{r — f,nt 



Since we have adopted the best- pivot multipole Iq, 
which is defined by requiring [31 y^. eY'. v(aJ)^bf — 0, 
the hkehhood Q can be rewritten as [18| 



2hi/: = 



rnt - Zp 



C, 



(9) 



where C is a constant, and the other quantities are de- 
fined by 



YjY 



T,e T,Y "-Jd 
Y «£ dg bi 



1 



'EeT.yaJdJk 



EeEviaJbi)^ 



1 

EiEviaJber 



, (10) 
(11) 



The posterior pdf relates to the hkehhood by the prior. 
Here, let us adopt the flat prior for the parameters r and 
rit , the 2-diniensional posterior pdf for the variables is 



2ln P{r,nt) 



2 / \ 2 

r - r„ \ I rnt ~ Zp 






(12) 



which follows the 1-dimensional posterior pdf for r as 
follows, 



P{r) — — exp 
r 



1 ir-Tr. 



C. 



(13) 



We notice that, when Vp ^ r^, corresponding to 
S/N ^ 1 (see [l^ for details), this pdf can be reduced 
that 



P{r) 



1 



■ exp 



1 I r -Tr. 



+ C'. 



(14) 



This is gaussian function for r, and peaks at r = r^ with 
spread Vg. From the expression of r„, we know that, the 
value of Tp depends on the data Dg by the quantity d^ . 
However, the average value of Vp for a larger number of 
sample is fp = f, i.e. Vp is an unbiased estimator for f. 
This has been mentioned in the previous paper fl8]. 

But here, we want to emphasize that, when Vp is not 
much larger than Vg , the peak of the posterior pdf in P^ 
is smaller than r^, due to the term 1/r. Especially when 
Tp < 3rs, the peak of the pdf is very close to zero, which 
is never an unbiased estimator for the input value f. This 
explains what we have found in the left panel of FiglTJ 



IV. AVOIDING THE STATISTICAL BIAS 

Now, let us consider the possible way to avoid this 
bias in the data analysis. Let us return to the likelihood 
function in ([9]). We find that, if considering r and z = rnt 
as two independent parameters, this likelihood is a simple 
gaussian function for the uncorrected parameters r and 
z. 



Now, we adopt the flat prior for the variables r and z, 
and the posterior pdf for r and z becomes 



2 In P(r,z) 



(15) 



from which follows that the 1-dimensional posterior pdf 
for r is 



P{r) = exp 



1 I r -Tr. 



C. 



(16) 



rp, which is an unbiased estimator 



This pdf peaks at r 
for the input value f. Similarly, we can also find that, 
the 1-dimcnsional posterior pdf for z peaks at z = Zp, 
which is also an unbiased estimator for z = frit. So the 
statistical bias in data analysis is elegantly avoided. 

In order to clearly show this result, we have analyzed 
the same 500 samples, by adopting the flat prior on r 
and z. In FigHJ we plot the distribution of the tml and 
zyvL with the solid columns. As expected, we find that, 
these tml and zml are all distributed around at their 
input values f — 0.05 and £ = 0, and the bias for the 
tensor-to-scalar ratio is naturally avoided. In this figure, 
we also plot the distribution oiritML, which also unbiased 
distributed around its input value fit — Q. 

It is interesting to compare the difference between the 
prior f{r,z) and the general prior f(r,nt). They can be 
related by the Jacobi, i.e. 



fir,nt) 



d{r, z) 



d{r,nt) 



f{r,z) = rf{r,z). 



(17) 



This relation shows that, the flat prior /(r, z) — 1 exactly 
corresponds to /(r, nt) = r. So, comparing with the 
analysis with fiat prior f{r,nt) = 1, the new flat prior 
/(r, z) induces a larger value of the variable r. 



V. CONCLUSION 

In this Letter, we find a statistical bias in the CMB 
data analysis for the detection of RGWs, when the signal- 
to-noise ratio is not very high. This could overlook the 
signal of RGWs in the CMB data analysis. We explain 
why this bias does exist by the analytical approximation 
of the likelihood function, and also find this bias can be 
elegantly avoided by adopting the orthogonalized param- 
eters r and z = rnt , instead of the general parameters r 
and nt- 

In the end, we should emphasize that a similar statis- 
tical bias might exist for any data analysis [20|, which 
should be carefully treated. 
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